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Abstract 

We study the problem of finding small trees. Classical network design problems 
are considered with the additional constraint that only a specified number k of nodes 
are required to be connected in the solution. A prototypical example is the fcMST 
problem in which we require a tree of minimum weight spanning at least k nodes in 
an edge-weighted graph. We show that the fcMST problem is NP-hard even for points 
in the Euclidean plane. We provide approximation algorithms with performance ratio 
2\/k for the general edge-weighted case and 0(k 1 l A ) for the case of points in the plane. 
Polynomial-time exact solutions are also presented for the class of decomposable graphs 
which includes trees, series-parallel graphs, and bounded bandwidth graphs, and for 
points on the boundary of a convex region in the Euclidean plane. 

We also investigate the problem of finding short trees, and more generally, that of 
finding networks with minimum diameter. A simple technique is used to provide a 
polynomial-time solution for finding fc-trees of minimum diameter. We identify easy 
and hard problems arising in finding short networks using a framework due to T. C. 
Hu. 
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1 Introduction 



1.1 Motivation: small trees 

The oil reconnaissance boats are back from their final trip off the coast of Norway 1 , and 
present you with a detailed map of the seas surrounding the coastline. Marked in this map 
are locations which are believed to have a good chance of containing oil under the sea bed. 
Your company has a limited number of oil rigs that it is willing to invest in the effort. Your 
problem is to position these oil rigs at marked places so that the cost of laying down pipelines 
between these rigs is minimized. The problem at hand can be modeled as follows: Given 
an edge- weighted graph and a specified number k, find a tree of minimum weight spanning 
at least k nodes. Note that a solution to the problem will be a tree spanning exactly k 
nodes. We call this problem the k-Minimum Spanning Tree (or the A;MST) problem. In 
this paper, we study such classical network-design problems as the MST problem with the 
additional constraint that only a specified number of nodes need to be incorporated into 
the network. Unlike the MST problem which admits a polynomial-time solution [4, 22, 25], 
the A;MST problem is considerably harder to solve 2 . 

Theorem 1.1 The k MST problem is A T T '-complete 3 . 

The above theorem holds even when all the edge weights are drawn from the set {1, 2, 3} (or 
any set containing three distinct values). It is not hard to show a polynomial-time solution 
for the case of two distinct weights. The problem remains NP-hard even for the class of 
planar graphs as well as for points in the plane. 

1.2 Approximation algorithms 

A p- approximation algorithm for a minimization problem is one that delivers a solution 
of value at most p times the minimum. Consider a generalization of the A;MST problem, 
the A;-Steiner tree problem: given an edge-weighted graph, an integer k and a subset of 
at least k vertices specified as terminals, find a minimum- weight tree spanning at least k 
terminals. We can apply approximation results for the A;MST problem to this problem by 
considering the auxiliary complete graph on the terminals with edges weighted by shortest- 
path distances. A p- approximation for the A;MST problem on the auxiliary graph yields 
a 2p- approximation for the A;-Steiner tree problem. Therefore we focus on approximations 
for the A;MST problem. We provide the first approximation algorithm for this problem. 

Theorem 1.2 There is a polynomial-time algorithm that, given an undirected graph G on 
n nodes with nonnegative weights on its edges, and a positive integer k < n, constructs a 
tree spanning at least k nodes of weight at most 2^/k times that of a minimum-weight tree 
spanning any k nodes. 

The algorithm in the above theorem is based on a combination of a greedy technique 
that constructs trees using edges of small cost and a shortest-path heuristic that merges 

1 Story reconstructed from a communication from Naveen Garg [16]. 

2 The main theorems in this paper are stated in the introduction and proved in later sections. 
3 This result was independently obtained by Lozovanu and Zelikovsky [23]. 
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trees when the number of trees to be merged is small. The analysis of the performance 
ratio is based on a solution-decomposition technique [10, 21, 26, 27] which uses the structure 
of the optimal solution to derive a bound on the cost of the solution constructed by the 
approximation algorithm. 

The above theorem provides a A:\f~k- approximation algorithm for the A;-Steiner tree prob- 
lem as well. Moreover, we can construct an example that demonstrates that the performance 
guarantee of the approximation algorithm is tight to within a constant factor. 

We can derive a better approximation algorithm for the case of points in the Euclidean 
plane. 

Theorem 1.3 There is a polynomial-time algorithm that, given n points in the Euclidean 
plane, and a positive integer k < n, constructs a tree spanning at least k of these points 
such that the total length of the tree is at most 0(ki) times that of a minimum-length tree 
spanning any k of the points. 

As before, we can construct an example showing that the performance ratio of the 
algorithm in Theorem 1.3 is tight. Our proof of Theorem 1.3 also yields as corollary an 
approximation algorithm for the rectilinear A;MST problem. 

Corollary 1.4 There is a polynomial-time algorithm that, given n points in the plane, and 
a positive integer k < n, constructs a rectilinear tree spanning at least k of these points such 
that the total length of the tree is at most 0(ki) times that of a minimum-length rectilinear 
tree spanning any k of the points. 

1.3 Exact algorithms: special cases 

Since the A;MST problem is NP-complete even for the class of planar graphs, we focus on 
special classes of graphs and provide exact solutions that run in polynomial time. Bern, 
Lawler and Wong [7] introduced the notion of decomposable graphs. A class of decompos- 
able graphs is defined using a finite number of primitive graphs and a finite collection of 
binary composition rules. Examples of decomposable graphs include trees 4 , series-parallel 
graphs and bounded-bandwidth graphs. We use a dynamic programming technique to prove 
the following theorem. 

Theorem 1.5 For any class of decomposable graphs, there is an 0(nk 2 )-time algorithm 
for solving the kMST problem. 

Though the A;MST problem is hard for arbitrary configurations of points in the plane, 
we have the following result. 

Theorem 1.6 There is a polynomial-time algorithm for solving the kMST problem for the 
case of points in the Euclidean plane that lie on the boundary of a convex region. 

4 A polynomial-time algorithm for trees was also independently obtained by Lozovanu and Zelikovsky 
[23]. 
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The proof of the above theorem uses a monotonicity property of the optimal tree along 
with a degree constraint on an optimal solution. This allows us to apply dynamic pro- 
gramming to find the exact solution. Several researchers in computational geometry have 
presented exact algorithms for choosing k points that minimize other objectives such as 
diameter, perimeter, area and volume [2, 12, 13, 14]. 

1.4 Short trees 

Keeping the longest path in a network small is often an important consideration in network 
design. We investigate the problem of finding networks with small diameter. Recall that the 
diameter of a tree is the maximum distance (path length) between any pair of nodes in the 
tree. The problem of finding a minimum-diameter spanning tree of an edge- weighted graph 
was shown to be polynomially solvable by Camerini, Galbiati and Maffioli [9] when the edge 
weights are nonnegative. They also show that the problem becomes NP-hard when negative 
weights are allowed. Camerini and Galbiati [8] have proposed polynomial-time algorithms 
for a bounded path tree problem on graphs with nonnegative edge weights. Their result 
can be used to show that the minimum-diameter spanning tree problem as well as its 
natural generalization to Steiner trees can be solved in polynomial time. We use a similar 
technique to show that the following minimum- diameter k-tree problem is polynomially 
solvable: given a graph with nonnegative edge weights, find a tree of minimum diameter 
spanning at least k nodes. 

Theorem 1.7 There is a polynomial-time algorithm for the minimum-diameter k-tree prob- 
lem on graphs with nonnegative edge weights. 

We investigate easy and hard results in finding short networks. For this, we use a 
framework due to T. C. Hu [19]. In this framework, we are given a graph with nonnegative 
distance values dij and nonnegative requirement values r 8 j between every pair of nodes i 
and j in the graph. The communication cost of a spanning tree is defined to be the sum over 
all pairs of nodes i,j of the product of the distance between i and j in the tree under d and 
the requirement r 8 j. The objective is to find a spanning tree with minimum- communication 
cost. Hu considered the case when all the d values are one and showed that a Gomory-Hu 
cut tree [18] using the r values as capacities is an optimal solution. Hu also considered the 
case when all the r values are one and derived sufficient conditions under which the optimal 
tree is a star. The general version of the latter problem is NP-hard [9, 20]. 

We define the diameter cost of a spanning tree to be the maximum cost over all pairs of 
nodes i,j of the distance between i and j in the tree multiplied by r 8 j. In Table 1, we present 
current results in this framework. All r 8 j and dij values are assumed to be nonnegative. 
The first two rows of the table examine the cases when either of the two parameters is 
uniform- valued. The last two rows illustrate that the two problems become NP-complete 
when both the parameters are two- valued. 

1.5 Short small trees 

We consider the A;-tree versions of the minimum- communication- cost and minimum-diameter- 
cost spanning tree problems and show the following hardness result. 
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Communication cost 


Diameter cost 


Arbitrary 


{a} 


Cut-tree [19] 


Open 


{a} 


Arbitrary 


NP-complete [20] 


Poly-time [9] 


{a,b} 


{0,c} 


Cut-tree variant (this paper) 


Poly-time (this paper) 


{a, 4a} 


{c,d} 


NP-complete [20] 


NP-complete (this paper) 



Table 1: Results on minimum- communication- cost spanning trees and minimum-diameter- 
cost spanning trees. 

Theorem 1.8 The minimum- communication k-tree problem and the minimum-diameter 
k-tree problem are both hard to approximate within any factor even when all the dij values 
are one and the rij values are nonnegative. 

In the next section, we present the NP-completeness results. Section 3 contains the 2\/k 
approximation for the A;MST problem. In Section 4, we present the stronger result for the 
case of points in the plane. Then we address polynomially solvable cases of the problem. 
In Section 6, we prove our results on short trees. We close with a discussion of directions 
for future research. 

2 NP-completeness results 

In this section we show that the A;MST problem is NP-hard by reducing the Steiner tree 
problem to it. The Steiner tree problem is known to be NP-hard [15]. As an instance of 
the Steiner tree problem we are given an undirected graph G, a set of terminals R (which 
is a subset of the vertex set of G) and a positive integer M, and the question is whether 
there exists a tree spanning R and containing at most M edges. We transform this input 
to an instance G',k, of the A;MST problem as follows: We let X = \V{G)\ — \R\ + 1 and 
connect each terminal of G to a distinct path of X new vertices, the path consisting of 
zero- weighted edges. We assign weight one to the already existing edges of G and set the 
weight between all other pairs of vertices to oo (a very large number). This is the graph G' 
(See Figure 1). We set k to be \R\ ■ X . If there exists a Steiner tree in G spanning the set 
R and containing at most M edges, then it is easy to construct a A;MST of weight at most 
M in G' . Conversely, by our choice of k and X, any A;MST in G 1 must contain at least one 
node from the path corresponding to each terminal in R. Hence any A;MST can be used to 
derive a Steiner tree for R in G. This completes the reduction. Extensions of hardness to 
the case of planar graphs and points in the plane follow in a similar way from the hardness 
of the Steiner tree problem in these restricted cases. Given a planar embedding of G we 
can create an embedded version of G' since only paths are added. 

The NP-hardness holds even when all the edge costs are from the set {1,2,3}. The 
reduction for this case is similar to the above. Without loss of generality we assume that 
in the given instance of the Steiner tree problem, G is connected and M < \V\ — 1. We 
let X = \V(G)\ — \R\ + 1 as before, and connect each terminal of G to a distinct set of X 
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vertices by edges of weight one. We set the original edges of G to have weight two and all 
other edges to have weight three. We choose k = \R\ ■ X + M + 1 and the bound on the 
cost of the A;MST to be \R\ ■ X + 2M. If there exists a Steiner tree in G spanning the set 
R and containing at most M edges, then it is easy to construct a A;MST of weight at most 
\R\ ■ X + 2M in G' . This is done by connecting all the newly added vertices to the Steiner 
tree using the weight one edges and then picking up more vertices (note that the graph is 
connected and M < \V\ — 1) using the weight two edges until there are \R\ ■ X + M + 1 
vertices. If there exists a A;MST of weight at most \R\ ■ X + 2M in G' then note that the 
A;MST cannot contain an edge of weight three because it has only k — 1 = \R\ • X + M edges 
and if it contained an edge of weight three then it would have to contain at least \R\ ■ X + 1 
edges of weight one but there are only \R\ X edges of weight one in G' . Further, the A;MST 
must span R, and since it has at most M edges of weight two, hence there must exist a 
Steiner tree in G spanning R and containing at most M edges. 

When there are only two distinct edge costs, the A;MST problem can be solved in 
polynomial time. The basic idea is the following: Let w\ and wi denote the two edge 
weights, where w\ < wi- Construct an edge subgraph G\ of G containing all the edges 
of weight w\. Choose a minimum number, say r, of the connected components of G\ 
to obtain a total of k nodes. Construct a spanning tree for each chosen component and 
connect the trees together into a single tree by adding exactly r — 1 edges of weight wi- It 
is straightforward to verify that the resulting solution is optimal. 



O-wt edges — 





> X= |V| - |R| + 1 



k = |R| -x 



Figure 1: The basic NP-hardness reduction from Steiner tree to A;MST. 
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3 The approximation algorithm for the general case 



In this section, we present the proof of Theorem 1.2. As input, we are given an undirected 
graph G with nonnegative edge weights and an integer k. 

3.1 The algorithm and its running time 

It is useful to think of the algorithm as running in two distinct phases: a merge phase and 
a collect phase. 

During the merge phase, the algorithm maintains a set of clusters and a spanning tree 
on the vertex set of each cluster. Initially each vertex forms a singleton cluster. At each 
step of the merge phase, we choose an edge of minimum cost among all edges that are 
between two clusters, and merge them by using the edge to connect their spanning trees. 

Define the size of a cluster to be the number of vertices that it contains. During the 
course of the merge phase, the clusters grow in size. The collect phase is entered only when 

(i) there exist at most \fk clusters whose sizes sum to at least k, and 

(ii) no cluster has size k or more. 

In the collect phase, we consider each cluster in turn as the root and perform a shortest- 
path computation between clusters using the weights on inter-cluster edges. We determine 
for each cluster C, the shortest distance dc such that, within distance dc from C, there 
exist at most \fk clusters whose sizes sum to at least k. Note that by the first precondition 
for starting the collect phase, the distance dc is well defined. We choose the cluster C with 
the minimum value of dc and connect it using shortest paths of length at most dc to each 
of these \fk clusters. We can prune edges from some of these shortest paths to output a 
tree of clusters whose sizes sum to k. We may do this since any cluster has less than k 
nodes at the start of this phase by the second precondition. 

The merge phase of the algorithm continues to run until both the preconditions of the 
collect phase are satisfied. Beginning with the step of the merge phase after which both 
preconditions of the collect phase are satisfied, at each subsequent step, the algorithm forks 
off an execution of the collect phase for the current configuration of clusters. The merge 
phase continues to run until a cluster of size k or more is formed. Next, merge phase prunes 
the edges of the spanning tree of the cluster whose size is between k and 2k so as to obtain 
a spanning tree of size exactly k. At this point, the merge phase terminates and outputs 
the spanning tree of the cluster of size k. Each forked execution of the collect phase outputs 
a spanning tree of size between k and 2k as well. The algorithm finally outputs the tree of 
least weight among all these trees. The algorithm is given below: 

Algorithm Merge- Collect 

1. Initialize each vertex to be in singleton connected components and the set of edges 
chosen by the algorithm to be (f>. Initialize the iteration count i = 1. 

2. Repeat until there exists a cluster whose size is between k and 2k 
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(a) Let VSi = {C\ ■ ■ -C'i} denote the set of connected components at the start of this 
iteration. Assume that the components are numbered in non-increasing order of 
their size. 

(b) Form an auxiliary graph G(VSi, E r ) where the edge (C'i,C'j) between two com- 
ponents is the minimum cost edge in E whose endpoints belong to C'i and Cj 
respectively. 

(c) Choose a minimum cost edge (C'i, Cj) in G(VSi, E r ) and merge the corresponding 
clusters C'i and Cj. 

(d) VS i+1 = VSi - {Ci} - {Cj} U {d U Cj} 

Remark: This corresponds to one iteration of merge phase. 

(e) Let j* = min{j : £Li \C t \ > k}. 

(f) If j* < Vkthen SOL, = Collect(G(V S, E')) 

(g) i = i + l; 

3. Prune the edges of the cluster whose size is between k and 2k to obtain a tree with 
exactly k vertices. Denote the tree obtained by MSOL. 

4. The output of the heuristic is the minimum valued tree among MSOL and all the 
SO L^s. 

Procedure Co\\ect(G(V, E)) 

1. For each cluster vertex C do 

(a) With the cluster C as the root, form a shortest path tree. 

(b) Let dc denote the shortest distance from C such that there are no more than 
y/k clusters whose sizes sum up to at least k. 

(c) Choose these clusters and join them to the root cluster by using the edges in the 
shortest path tree computed in Step 1(a). 

(d) Prune the edges of the tree to obtain a tree having exactly k nodes. 

2. Output the tree corresponding to the choice of the root cluster C that minimizes dc- 

It is easy to see that there are at most 0(n) steps in the merge phase and hence at most 
this many instances of the collect phase to be run. Using Djikstra's algorithm [11] in each 
collect phase, the whole algorithm runs in time 0(n 2 (m + ralogra)) where m and n denote 
the number of edges and nodes in the input graph respectively. The running time of the 
collect phase dominates the running time of the merge phase. 
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3.2 The performance guarantee 

Consider an optimal A;MST of weight OPT . During the merge phase, nodes of this tree 
may merge with other nodes in clusters. We focus our attention on the number of edges 
of the optimal A;MST that are exposed, i.e., remain as inter-cluster edges. We show that 
at any step in which a large number of edges of the A;MST are exposed, every edge in the 
spanning tree of each cluster has small weight. 

Lemma 3.1 If at the beginning of a step of the merge phase, an optimal kMST has at least 
x exposed edges (inter-cluster edges), then each edge in the spanning tree of any cluster at 
the end of the step has weight at most . 

Proof: The proof uses induction on the number of steps. Suppose that an optimal A;MST 
has at least x exposed edges at the beginning of the current step of the merge phase. 
Then at the beginning of the previous step, the optimal A;MST must have had at least x 
exposed edges as well. Thus by the induction hypothesis every edge in the spanning tree 
of any cluster at the end of the previous step has weight at most . Since only one new 
composite cluster is formed in the current step, it remains to show that the edge added in 
this iteration has cost at most . But this is straightforward since there is an optimal 
A;MST with at least x exposed edges of total weight at most OPT. □ 

We now prove the performance guarantee in Theorem 1.2. The above lemma is useful 
as long as the number of exposed edges is high. Applying the lemma with x = \fk shows 
that every edge in the spanning tree of each cluster has weight at most . Consider the 
scenario when the merge phase runs to completion to produce a tree with at least k nodes 
even before the number of exposed edges falls below \fk. In this case, since the resulting 
tree has at most k nodes, the cost of the tree is at most ■ k < 1\fk ■ OPT . 

Otherwise, the number of exposed edges falls below \fk before the merge phase runs to 
completion. However, in this case, note that both preconditions for the start of the collect 
phase will have been satisfied. Hence the algorithm must have forked off a run of the collect 
phase. We show that the tree output by this run has low weight. Consider a shortest-path 
computation of the collect phase rooted at a cluster containing a node of the optimal A;MST. 
Then clearly, within a distance at most OPT , we can find at most \fk clusters whose sizes 
sum to at least k. Since the number of exposed edges is less than \fk, the clusters containing 
nodes of the optimal tree form such a collection. Since there are at most \fk clusters to 
connect to, the weight of these connections is at most \fk ■ OPT . It remains to bound the 
weight of the spanning trees within each of the clusters retained in the output solution. 
This is not hard since all edges in these clusters have weight at most by Lemma 3.1. 
Since the size of the output tree is at most k (as a result of the pruning), the total weight 
of all the edges retained within these clusters is at most \fk ■ OPT . Summing the weight of 
these intra-cluster edges and the inter-cluster connections shows that the output tree has 
cost at most 2^/k-OPT. This proves the performance ratio of 2^/k claimed in Theorem 
1.2. 

The example in Figure 2 shows that the performance ratio of the algorithm is fi(\/&)- 
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Figure 2: Example of a graph in which the algorithm in Theorem 1.2 outputs a tree of 
weight 0,(0 PT ■ y/k). The optimal A;MST is the horizontal path made of zero-weight edges 
and the \/k edges of weight each. All zero-weight edges will be chosen first in the 

merge phase. The merge phase running to completion will extend each of the zero- weight 
upward-directed paths to include £l(k) edges each of weight resulting in a tree of 

weight £l(OPT-\/k). The collect phases may output trees consisting of all the \/k -\- 1- sized 
clusters at the bottom of the figure each of weight £1(0 PT ■ y/k). 
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4 An approximation algorithm for points on the plane 



In this section, we present a heuristic for the A;MST problem for points on the plane and a 
proof of its performance guarantee. Let S = {s\,S2, ...,s n } denote the given set of points. 
For any pair of points s 4 - and Sj, let d(i,j) denote the Euclidean distance between s 4 - and 

Sj. 

4.1 The heuristic 

I. For each distinct pair of points s 4 -, Sj in S do 

(1) Construct the circle C with diameter S = yj%d(i,j) centered at the midpoint of 
the line segment (s 4 -,Sj). 

(2) Let Sc be the subset of S contained in C . If Sc contains fewer than k points, skip 
to the next iteration of the loop (i.e., try the next pair of points). Otherwise, do 
the following. 

(3) Let Q be the square of side 8 circumscribing C . 

(4) Divide Q into k square cells each with side = 8/y/k. 

(5) Sort the cells by the number of points from Sc they contain and choose the 
minimum number of cells so that the chosen cells together contain at least k 
points. If necessary, arbitrarily discard points from the last chosen cell so that 
the total number of points in all the cells is equal to k. 

(6) Construct a minimum spanning tree for the k chosen points. (For the rectilinear 
case, construct a rectilinear minimum spanning tree for the k chosen points.) 

(7) The solution value for the pair (s 4 -,Sj) is the length of this MST. 

II. Output the smallest solution value found. 

It is easy to see that the above heuristic runs in polynomial time. In the next subsection, 
we show that the heuristic provides a performance guarantee of 0(k 1 l A ). We begin with 
some lemmas. 

4.2 The performance guarantee 

Lemma 4.1 Let S denote a set of points on the plane, with diameter A. Let a and b be 
two points in S such that d(a,b) = A. Then the circle with diameter \/3A centered at the 
midpoint of the line segment (a,b) contains S. 

Proof: Suppose there exists a point p £ S not contained within the circle of diameter y/3A 
centered at the midpoint of the line segment (a, b). If p lies on the perpendicular bisector 
of the line segment (a, b) then it is clear that d(a,p) = d(b,p) > A, else p is closer to one of 
a and b than the other. Say p is closer to a; then it is easy to see that d(b,p) > A. Thus, if 
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there exists a point outside the circle then it contradicts the fact that the diameter of the 
set S is A. Hence S must be contained within the circle. □ 

Lower Bounds on an Optimal A;MST 

The following lemma is used to establish a lower bound on OPT. 

Lemma 4.2 Consider a square grid on the plane with the side of each cell being a. Then 
the length of an MST for any set of t points, where each point is from a distinct cell is 
Sl(ta). 

Proof: Pick a point from the set and discard all points in the eight cells neighboring the cell 
containing the chosen point. Doing this repeatedly we choose a subcollection of t/9 = 
points such that the distance between any pair of points in the subcollection is at least a. 
The lemma then follows from the observation that the minimum length of a tree spanning 
points that are pairwise c-distant is □ 
Let P* denote the set of points in an optimal solution to the problem instance. Let A 
denote the diameter of P* (i.e., the maximum distance between a pair of points in P*), 
and OPT denote the length of an MST for P*. Consider an iteration in which the circle 
constructed by the heuristic is defined by two points a and b in P* such that d(a,b) = A. 
Let g be the number of square cells used by the heuristic in selecting k points in this 
iteration. To establish the performance guarantee of the heuristic, we show that the length 
of the MST constructed by the heuristic during this iteration is within a factor 0(k 1 l A ) of 
OPT. 

It is easy to see that OPT > A because A is the diameter of P*. 

Since the heuristic uses a minimum number (g) of square cells in selecting k points, the 
points in P* must occur in g or more square cells. Note that the side of each square cell is 
v^A/Vk. This gives us the following corollary to Lemma 4.2. 

Corollary 4.3 

OPT = tt(gA/Vk) 
Upper Bound on the Cost of the Heuristic 

We now prove an upper bound on the cost of the spanning tree returned by the heuristic. 
For this, we need the following lemma. 

Lemma 4.4 The length of a minimum spanning tree for any set of q points in a square 
with side a is length 0{a^Jq). 

Proof: Paste a square grid over the square where each sub-cell in the grid has side a / ' y/q. 
Connect each point to a closest vertex in the grid. Consider the tree consisting of one 
vertical line, all the horizontal lines in the grid connected to the vertical line, and the 
vertical lines connecting each point to its nearest horizontal line (See Figure 3). It is clear 
that the grid lines in the tree have total length 0((Jy/q) and the lines connecting the points 
to the grid have total length q ■ 0(a/y/q) = 0((Jy/q). □ 
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Figure 3: A spanning tree of length 0{a^Jq) on any q points in a square of side a. 

Lemma 4.5 The length of the spanning tree constructed by the heuristic is Oi^/gA). 

Proof: Let Qi denote the set of points in the i th cell chosen by the heuristic, 1 < i < g. 
Thus Yli=i \Qi\ = P- Consider the following two-stage procedure for constructing a spanning 
tree for the points in Uf =1 Q;. 

Stage I: Construct a minimum spanning tree for the points in Qi, 1 < i < g. Note that the 
points in Qi are within a square of side ^/3A/^/p. Using Lemma 4.4, the length of an MST 
for Qi is 0(-^\/\Qi\). Thus, the total length of all the minimum spanning trees constructed 

in this stage is 0(-^= J2i=i \AQi\) = 0(^/g A) by the Cauchy-Schwartz inequality. 

Stage II: Connect the g spanning trees constructed in Stage I into a single spanning tree 
as follows. Choose a point arbitrarily from each Qi (1 < i < g), and construct an MST for 
the g chosen points. Note that these g points are within a square of side \/3 A. Thus, by 
Lemma 4.4, the length of the MST constructed in this stage is Oi^/g A) as well. 

Thus, the total length of the spanning tree constructed by the two-stage procedure is 
O(^gA). □ 

The Final Analysis 

We are now ready to complete the proof of the performance bound. As argued above, 
OPT = 0(A), and from Corollary 4.3, OPT = Sl(gA/y/k). Thus OPT = 0(max{A, gA/Vk}). 
Also from Lemma 4.5, the length of the spanning tree produced by the heuristic is 0(^/g A). 
Therefore, the performance ratio is 0(min{y^, \/k/g}) = 0(k 1 ^ 4 ) as claimed. 

The example in Figure 4 shows that the performance ratio of the heuristic is 0(A; 1/ ' 4 ). 

Observe that both our lower bounds on an optimal solution and the upper bound on the 
spanning tree obtained also apply to the case of constructing a rectilinear A;MST. Hence it 
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follows that the above approximation algorithm delivers a performance guarantee of 0(k 1 l A ) 
for the rectilinear A;MST problem too. This proves Theorem 1.4. 



Square with points 
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Diagonal cells with J"|k points clustered together 

Uniformly distributed cells withsfiT points scattered 
uniformly in each 



Figure 4: Example of a configuration of points on the plane in which the heuristic outputs 
a tree of length 0,(0 PT ■ y/k). The big square has side a. Each cell of the square grid has 
side ojyfk. There are \f~k~ points clustered closely together in each cell along the diagonal 
of the big square. And in each of \f~k~ cells distributed uniformly throughout the big square 
there are \f~k~ uniformly distributed points. The heuristic may pick up the points in the 
uniformly distributed cells forming a tree of length fi(<7 • A; 1 / 4 ) while the tree spanning the 
points along the diagonal has length 0(a). 



5 Exact algorithms for special cases 
5.1 &MST for Decomposable Graphs 

In this section, we prove Theorem 1.5. A class of decomposable graphs T is given by a 
set of rules satisfying the following conditions [7]. 

1. The number of primitive graphs in T is finite. 

2. Each graph in T has an ordered set of special nodes called terminals. The number 
of terminals in each graph is bounded by a constant. 

3. There is a finite collection of binary composition rules that operate only at terminals, 
either by identifying two terminals or adding an edge between terminals. A composi- 
tion rule also determines the terminals of the resulting graph, which must be a subset 
of the terminals of the two graphs being composed. 
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Examples of decomposable graphs include trees, series-parallel graphs, bounded-bandwidth 
graphs, etc. [7]. 

Let r be any class of decomposable graphs. The A;MST problem for T can be solved 
optimally in polynomial time using dynamic programming. Following [7], it is assumed 
that a given graph G is accompanied by a parse tree specifying how G is constructed using 
the rules and that the size of the parse tree is linear in the number of nodes of G. 

Consider a fixed class of decomposable graphs T. Suppose that G is a graph in T. Let 
7r be a partition of a nonempty subset of the terminals of G. We define the following set of 
costs for G. 

CostJ(G) = Minimum total cost of any forest containing a tree for each block 
of 7r, such that the terminal nodes occurring in each tree are 
exactly the members of the corresponding block of 7r, no pair 
of trees is connected, the total number of edges in the forest 
is i and each tree contains at least one edge (1 < i < k). 

C ostf,_ 1 (G) = Minimum cost of a tree within G containing k — 1 edges, and 
containing no terminal nodes of G. 

For any of the above costs, if there is no forest satisfying the required conditions, the value 
of Cost is defined to be oo. 

Note that because T is fixed, the number of cost values associated with any graph in 
the parse tree for G is 0(k). We now show how the cost values can be computed in a 
bottom-up manner, given the parse tree for G. 

To begin with, since T is fixed, the number of primitive graphs is finite. For a primitive 
graph, each cost value can be computed in constant time, since the number of forests to be 
examined is fixed. Now consider computing the cost values for a graph G constructed from 
subgraphs G\ and G2, where the cost values for G\ and G2 have already been computed. 

Let Hq 1 , Hq 2 and Hq be the set of partitions of a subset of the terminals of G\, G2 and 
G respectively. Let A be the set of edges added to G\ and G2 by the composition rule R 
used in constructing G from G\ and Gi- Corresponding to rule R, there is a partial function 
fn : Hq 1 X Hg 2 X 2 a —> IIg, such that a forest corresponding to partition ir\ in Hq 1 , a 
forest corresponding to partition ^2 in II g 2 , an( i a subset B C A, combine to form a forest 
corresponding to partition /r(7Ti, 7T2, B) of G. Furthermore, if the forest corresponding to 
7i"i contains i edges, and the forest corresponding to ^2 contains j edges, then the combined 
forest in G contains i + j + \B\ edges. 

Similarly, there is a partial function gn : Hq 1 X 2 A —> IIg, such that a forest corre- 
sponding to partition tt\ in TIg^ and a subset B C A combine to form a forest correspond- 
ing to partition gn(iri,B) of G. If the forest corresponding to tt\ contains i edges, then 
the combined forest in G contains i + \B\ edges. There is also a similar partial function 
tiR : Hg 2 X 2 A —> Hg- Finally, there is a partial function jr : 2 A —> Hg- 

Using functions /r, gn, Iir and jn, cost values for G can be computed from the set 
of cost values for G\ and G^- For instance, suppose that /r(7Ti, 7T2, B) = it. Then a 
contributor to computing CostJ(G) is Costf 1 (G\) + C ostJ^_ t _^ B ^(G2) + w(B), for each / 
such that 1 < / < i — \B\ — 1. Here w(B) is the total cost of all edges in B. The value of 
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CostJ(G) is the minimum value among its contributors. 

When all the cost values for the entire graph G have been computed, the cost of an 
optimal A;MST is equal to min {Cos/^_ 1 (G)}, where the forest corresponding to 7r consists 

of a single tree. 

We now analyze the running time of the algorithm. For each graph occurring in the 
parse tree, there are 0{k) cost values to be computed. Each of the cost values can be 
computed in 0{k) time. As in [7], we assume that the size of the given parse tree for G is 
0(n). Then the dynamic programming algorithm takes time 0(nk 2 ). This completes the 
proof of Theorem 1.5. 

5.2 &MST for points on the boundary of a convex region 

We now restrict our attention to the case where we are given n points that lie on the 
boundary of a convex region, and show that the A;MST on these points can be computed 
in polynomial time using dynamic programming. We also provide a faster algorithm if the 
points are constrained to lie on the boundary of a circle. 

Lemma 5.1 Any optimal kMST for a set of points in the plane is non self-intersecting. 

Proof: Suppose an optimal A;MST were self intersecting, then let (a, b) and (c, d) be the 
intersecting line segments. On removing the edges (a, b) and (c,d) from the A;MST we 
get three connected components, hence some two vertices, one from {a, 6} and one from 
{c, d} must be in the same connected component. Say, a and d are in the same connected 
component, then since in any convex quadrilateral the sum of two opposite sides is less 
than the sum of the two diagonals, replacing (a, b) and (c, d) by (a, c) and (6, d) we still get 
a tree spanning k nodes but with lesser weight. This contradicts the fact that the A;MST 
we started out with was optimal. Hence any optimal A;MST on a set of points in the plane 
must be non self-intersecting. □ 

Lemma 5.2 Given n points on the boundary of a convex polygon no vertex in an optimal 
kMST of these points has degree greater than J h 

Proof: Suppose there is a vertex v in an optimal A;MST with degree greater than 4. Let 
V\,V2, . . . ,v<i,d > 5 be its neighbors in the optimal A;MST as shown in the figure (See 
Fig. 5.). Using the well known fact that any convex polygon lies entirely on one side of 
a supporting line, we have that Lv\vv,i < 180°. By the pigeon-hole principle, there is an 
i such that LviW^x < 180°/(<i — 1) < 60°, 1 < i < d — 1 since d is at least 5. Thus in 
AviV LviWi+i is not the largest angle, and is not the largest side. Therefore 

replacing the larger of W{ and v in the optimal A;MST with V{V we obtain a tree with 
lesser weight, contradicting the assumption that the A;MST was optimal. This completes 
the proof. □ 

We now characterize the structure of an optimal solution in the following decomposi- 
tion lemma and use it to define the subproblems which we need to solve recursively using 
dynamic programming. 
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kMST 

Convex Polygon 

Figure 5: Points on a convex polygon. 

Lemma 5.3 (Decomposition lemma.) Let vq, v\, . . . , be the vertices of a convex 

polygon in say, clockwise order. Let V{ be a vertex of degree d{ in an optimal kMST. Note 
that 1 < di < 4. 

If di > 2 let the removal of V{ from the optimal kMST produce connected components 
C\, C*2, • • • , C,i t (See Fig 6.). Let \C'i\ denote the number of vertices in component C{. Then 
there exists a partition of f;+2 5 • • • , v i-i, (indices taken mod n), into di contiguous 
subsegments Si, S2, • • • , Sd t such that Vj, 1 < j < di, the optimal kMST induced on Sj \J{vi} 
is an optimal (\Cj\ + 1)MST on Sj \J{vi} in which the degree of Vi is one. 
If di = 1, let Vj be Vi's neighbor in the optimal kMST. Let Vj be adjacent to dji vertices 
in v i + i ,v i + 2 ■ ■ ■ > v j-i an d dj2 vertices in fj+i, fj+2 5 • • • , v i-i- let the optimal kMST contain 
\C\\ vertices from the set v i + i ,v i + 2 ■ ■ ■ > v j-i an d | C2 1 vertices from the set f j+i, v j+2, • • • , ^i-i- 
Then the optimal kMST induced on v ,v i + 2 ■ ■ ■ , v j is an optimal (\C'i\ + l)MST on f 8 '+2 ■ ■ ■ ,Vj 
with degree of Vj = dji and the optimal kMST induced on Vj,Vj + i . . . , is an optimal 
(IC2I + 1)MST on Vj, Vj + i . . . , Vi_i with degree of Vj = dj2- 

Proof: If di > 2 then it is easy to see that a partition of fi+2 5 • • • , v i-i into contiguous 
subsegments Si, S2, ■ ■ ■ , Sd t exists such that Vj, 1 < j < di,Cj C Sj, because the optimal 
A;MST is non self-intersecting by Lemma 5.1. Further, the optimal A;MST induced on 
Sj U{vi} must be an optimal (\Cj\ + 1)MST on Sj [J{vi} with degree of Vi = 1, for otherwise 
we could replace it getting a lighter A;MST. The proof of the case when di = 1 is equally 
straightforward and is omitted. □ 

Thus the subproblems we consider are specified by the following four parameters: a 
size s, a vertex Vi, the degree di of Vi, and a contiguous subsegment Vki, ffci+i, • • • , ffc2 
such that i (j£ [kl . . .k2]. A solution to such a subproblem denoted by SOLN(s; v^, di] 
Vki, ffci+i, • • • , Vkz) is the weight of an optimal sMST on {vi, Vki, ffci+i, • • • , v^} in which 



16 




Convex Polygon 
kMST 



Figure 6: Decomposition. 



Vi has degree d{. Using the decomposition lemma above, we can write a simple recurrence 
relation for SOLN(s; v t ; d t ; v kl , v kl+1 , . . . , v k2 ). 
SOLN(s; v t ; d t ; v kl ,v kl+1 , . . . , v k2 ) = 



' oo : if d t = or s < d % ; + 1 or ((k2 - kl + 1) mod n) + 1 < s. 

min min T, 1<j<di SOLN(sj; v t ; 1; vy , v k 

,-1,Sj>1 - 3~ L 



if di > 2 



k' =kl<k[ ...<k' d =k2 si ...+s dj =s+d 8 
JO = «l<Jl <J2=«2 

(S0LN(s 1 ; v n ;di, v jo ,. . ., Vjj-i) + SOLN(s 2 ; v n ;d 2 ; v jl+1 ,. ..,v n ))}) : if rf 8 = 1 



Here w(viVj) is the cost of the edge The optimal A;MST = 

min min S0LN(k; v t ; d; v t+1 , v t+2 , . . . , 

l<i<ra l<(i<4 

Note that we have 0(kn 3 ) subproblems and each subproblem requires looking up the 
solution to at most 0(k 3 n 3 ) smaller subproblems. This yields a running time of 0(k 4 n 6 ). 
When k = 0(-^/n), this running time can be further improved by organizing the computation 
of the recurrences for the smaller subproblems better. Consider a vertex v, an integer 
< s < k denoting the size of tree, and a partition of the other (n — 1) vertices into 
four groups. This corresponds to one of the subproblems we need to solve. Each smaller 
subproblem of this subproblem is specified by the number of nodes s(< k) in the tree, vertex 
Vi of degree di in the interval v k \ , . . . , v k2 can be solved by first computing a partition of 
the interval into at most four parts (exactly four when di = 4). For the first subinterval, we 
compute the best tree containing v on i nodes with other nodes only from this subinterval, 
and v has degree one in this tree, for 1 < i < s. This computation takes 0{nk) times since 
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there are at most s < k trees to be computed, and for each i, there are at most n nodes 
with which V{ shares the single edge in the best tree. Next, we include the next subinterval, 
and compute for 1 < i < s, the best tree on i nodes containing V{ and nodes from these two 
subintervals, where V{ has degree two with one edge to a node in the first and one edge to a 
node in the second subinterval. This set of trees can also be computed in 0{nk) time given 
the set of trees for the first subinterval as follows: First, compute the best tree on i nodes 
for 1 < i < s containing V{ and nodes only in the second subinterval, where V{ has exactly 
one edge to a node in this subinterval, in 0{nk) time as before. Using these values and the 
analogous set of values for the first subinterval, the best i trees for the first two subintervals 
can be obtained in 0(k 2 ) = 0{nk) time, since each of the s < k trees requires looking up 
at most s diffferent pairs of trees, one from each subinterval. This method can be extended 
to compute the solution for the whole set of four subintervals in 0{nk) time. Since there 
are 0(n 3 ) ways to partition a given interval into four subintervals, the recurrence for this 
subproblem can be solved in 0(kn 4 ) time. So the total time to solve one subproblem is 
0(kn 4 ) time. Since there are a total of 0(kn 3 ) subproblems, the total running time of the 
algorithm is 0(k 2 n 7 ). 

We now provide a faster algorithm to find the optimal A;MST in the case when all n 
points lie on a circle. We assume that no two points are diametrically opposite. 

Lemma 5.4 Given n points t>i,t>2, • • - ,v n on a circle no vertex in an optimal kMST has 
degree more than 2. 

Proof: Suppose point v p in an optimal A;MST has degree greater than 2. Then consider the 
diameter passing through v p . At least two neighbors of v p lie on one side of this diameter. 
Let these neighbors be v q and v r , where v q is closer to v p than v r . Then since lv p v q v r is 
obtuse we can replace v p v r by v q v r to get a smaller tree. □ 

Lemma 5.4 implies that if the points lie on a circle then every optimal A;MST is a path. 
Moreover, if the path "zig-zags", then we can replace the crossing edge with a smaller edge. 
Thus we have the following lemma. 

Lemma 5.5 Given n points v\, V2, ■ ■ ■ , v n on a circle, let a minimum length k-path on these 
points be v ^ , . . . , Vi . Then the line segment joining and Vi along with the k-path forms 
a convex k-gon. 

Proof: By Lemma 5.4 the minimum-length A;-path is also the minimum-length A;MST. 
Suppose the line segment joining and Vi along with the minimum A;-path does not form 
a convex A;-gon. Then there exists a zig-zag in the path as shown in Figure 7. Say the 
center of the circle lies to the right of the edge (a, b) then we can replace (a, b) by the edge 
(6, c) to get a smaller A;MST which contradicts the fact that the A;-path we started out with 
was optimal. □ 

Lemmata 5.4 and 5.5 lead to a straightforward dynamic programming algorithm to 
compute an optimal A;MST for points on a circle in 0(n 3 ) time. 
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Figure 7: Illustration of Lemma 5.5. 

6 Short trees and short small trees 
6.1 Short trees 

In this subsection, we prove our results on short trees. First, we address the minimum- 
diameter A;-tree problem: Given a graph with nonnegative edge weights, find a tree of 
minimum diameter spanning at least k nodes. 

Recall that the diameter of a tree is the maximum distance (path length) between any 
pair of nodes in the tree. We introduce the notion of subdividing an edge in a weighted 
graph. A subdivision of an edge e = (u, v) of weight w e is the replacement of e by two 
edges ei = (u, r) and = (r, v) where r is a new node. The weights of e\ and sum to 
w e . Consider a minimum-diameter A;-tree. Let x and y be the endpoints of a longest path 
in the tree. The weight of this path, D, is the diameter of the tree. Consider the midpoint 
of this path between x and y. If it falls in an edge, we can subdivide the edge by adding 
a new vertex as specified above. The key observation is that there exist at least k vertices 
at a distance at most D/2 from this midpoint. This immediately motivates an algorithm 
for the case when the weights of all edges are integral and bounded by a polynomial in 
the number of nodes. In this case, all such potential midpoints lie in half-integral points 
along edges of which there are only a polynomial number. Corresponding to each candidate 
point, there is a smallest distance from this point within which there are at least k nodes. 
We choose the point with the least such distance and output the breadth-first search (bfs) 
tree rooted at this point appropriately truncated to contain only k nodes. 

When the edge weights are arbitrary, the number of candidate midpoints are too many 
to check in this fashion. However, we can use a graphical representation of the distance of 
any node from any point along a given edge to bound the search for candidate points. We 
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can think of an edge e = (u, v) of weight w as a straight line between its endpoints of length 
w. For any node x in the graph, consider the shortest path from i to a point along the edge 
e at distance I (< w) from u. The length of this path is the minimum of I + d(x,u) and 
w — £ + d(v , x). We can plot this distance of the node x as a function of I. The resulting plot 
is a piecewise linear bitonic curve that we call the roof curve of x in e (See Figure 8). For 
each edge e, we plot the roof curves of all the vertices of the graph in e. For any candidate 
point in e, the minimum diameter of a A;-tree centered at this point can be determined by 
projecting a ray upwards from this point in the plot and determining the least distance at 
which it intersects the roof curves of at least k distinct nodes. The best candidate point 
for a given edge is one with the minimum such distance. Such a point can be determined 
by a simple line sweep algorithm on the plot. Determining the best midpoint over all edges 
gives the midpoint of the minimum-diameter A;-tree. This proves Theorem 1.7. 

The following lemma gives yet another way to implement the polynomial time algorithm 
for finding a tree of minimum diameter spanning k nodes. 

Lemma 6.1 Given two vertices in a graph, V{ and Vj, such that every other vertex is within 
distance d{ of V{ or dj of Vj , it is possible to find two trees, one rooted at V{ and of depth at 
most di and one rooted at Vj of depth at most dj which partition the set of all vertices. 

Proof: Consider the shortest-path trees T 4 - and Tj rooted at V{ and Vj of depth di and dj, 
respectively. Every vertex occurs in one tree or both trees. Consider a vertex v p that occurs 
in both the trees. If it is the case that di — depthy 8 (v p ) is greater than dj — depthy^ (v p ) then 
the same is true of all descendants of v p in Tj. Hence we can remove v p and all it's descen- 
dants from Tj since we are guaranteed that all these vertices occur in T 4 -. Repeating this 
procedure bottom-up we get two trees satisfying the required conditions and partitioning 
the vertex set. □ 

The above lemma motivates the following alternate algorithm for finding a minimum- 
diameter tree spanning at least k nodes. For each vertex Vi in the graph compute the 
shortest distance di such that there are k vertices within distance di of V{. For each edge 
(vi,Vj) compute the least d\- + d\- such that there are k vertices within distance d\- of Vi 

or d\- of Vj. Then compute the least of all the d^s and d\j + d\- + w(vi, v j)'s and this is the 
diameter of the A;-tree with least diameter. 

We now address the results in the third row of Table 1. 

Lemma 6.2 If the r\j values are drawn from the set {a, b} and the dij values from {0,c} 
then the minimum-communication-cost spanning tree can be computed in polynomial time. 

Proof: When the dij values are all uniform, Hu [19] observed that the Gomory-Hu cut 
tree with the r\j values as capacities is a minimum- communication- cost tree. We can use 
this result to handle the case when zero-cost dij edges are allowed as well. We contract 
the connected components of the graph using zero-cost dij edges into supernodes. The 
requirement value rjj between two supernodes vi and vj is the sum of the requirement 
values rij such that i £ vi and j £ v j. Now we find a Gomory-Hu cut tree between the 
supernodes using the 77 j values as capacities. By choosing an arbitrary spanning tree of 
zero-rfjj-valued edges within each supernode and connecting them to the Gomory-Hu tree, 
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min { d(u,x) + I, d(v,x) + w - I } 



d(v,x) 



J ^ 



Figure 8: A roof curve of a node x in edge e = (u, v). 
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we get a spanning tree of the whole graph. It is easy to verify that this is a minimum- 
communication- cost spanning tree in this case. □ 

Lemma 6.3 When all the dij values are uniform and there are at most two distinct rij 
values (say a and b) then the minimum- diameter- cost spanning tree can be computed in 
polynomial time. 

Proof: Let the higher of the two r 8 j values be a. If the edges with requirement a form a 
cyclic subgraph, then any spanning tree has diameter cost 2a. In this case, any star is an 
optimal solution. Otherwise, consider the forest of edges with requirement a. Determine 
a center for each tree in this forest. Consider the tree formed by connecting these centers 
in a star. The root of the star is a center of the tree of largest diameter in the forest. If 
the diameter cost of the resulting tree is less than 2a, it is easy to see that this tree has 
optimum diameter cost. Otherwise any star tree on all the nodes has diameter cost 2a 
and is optimal. Note that we can extend this solution to allow zero-cost dij edges by using 
contractions as before. □ 

Now we address the results in the fourth row of Table 1. 

Lemma 6.4 The minimum-diameter-cost spanning tree problem is NP-complete even when 
the rij 's and dij 's take on at most two distinct values. 

Proof: We use a reduction from an instance of 3SAT. We form a graph that contains a 
special node / (the "true" node), a node for each literal and each clause. We use two dij 
values, c and d where we assume c < d. Each literal is connected to its negation with an 
edge of distance c. The true node is connected to every literal with an edge of distance c. 
Each clause is connected to the three literals that it contains with edges of distance c. All 
other edges in the graph have distance d. Now we specify the requirements on the edges. 
We use requirement values from {a, 4a}, where a / 0. The requirement value of an edge 
between a literal and its negation is 4a. The requirement value of all other edges is a (See 
Figure 9). Assuming that d > 4ac, it is easy to check that there is a spanning tree of this 
graph with diameter cost at most 4ac if and only if the 3SAT formula is satisfiable. □ 

6.2 Short small trees 

Finally we prove Theorem 1.8. We prove the theorem for the communication tree case. The 
proof of the other part is similar. Suppose there is a polynomial-time M- approximation 
algorithm for the minimum-communication-costfc-tree problem where all the dij values are 
one and all r 8 j values are nonnegative. Then, we show that the ^-independent set problem 
can be solved in polynomial time. The latter problem is well known to be NP-complete 
[15]. Given graph G of the /^-independent set problem, produce the following instance of 
the communication A;-tree problem: dij = 1 for every pair of nodes Assign r 8 j equals 
one if is not an edge in G, and Mk(k — 1) + 1 otherwise. If G has an independent 

set of size k, then we can form a star on these k nodes (choosing an arbitrary node as the 
root). In the star, the distance between any pair of nodes is at most 2 and the r value for 
each pair is 1. Thus, the communication cost of an optimum solution is at most k(k — 1). 
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Figure 9: Reduction from an instance of 3SAT to the minimum-diameter-cost spanning 
tree problem. 

The approximation algorithm will return a solution of cost at most Mk(k — 1). The nodes 
in this solution are independent in G by the choice of r 8 j for nonedges G G. On the 

other hand, if there is no independent set of size k in G, the communication cost of any 
A;-tree is greater than Mk(k — 1). 

7 Closing remarks 
7.1 Future research 

A natural question is whether there are approximation algorithms for the A;MST problem 
which provide better performance guarantees than those presented in this paper. An inter- 
esting observation in this regard is the following. Any edge in an optimal A;MST is a shortest 
path between its endpoints. This observation allows us to assume without loss of generality 
that the edge weights on the input graph obey the triangle inequality. Although we have 
been unable to exploit the triangle inequality property in our algorithms, it is possible that 
this remark holds the key to improving our results. In this direction, Garg and Hochbaum 
[17] have recently given an O(log /^-approximation algorithm for the A;MST problem for 
points on the plane using an extension of our lower-bounding technique in Section 4. 

Table 1 is incomplete. It would be interesting to know the complexity of the minimum- 
diameter-cost spanning tree problem when the distance values are uniform. Note that any 
star tree on the nodes provides a 2- approximation to the minimum-diameter-cost spanning 
tree in this case. The above problem can be shown to be polynomial-time equivalent to the 
following tree reconstruction problem: given integral nonnegative distances dij for every 
pair of vertices does there exist a spanning tree on these nodes such that the distance 
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between i and j in the tree is at most dip. 
7.2 Maximum acyclic subgraph 

In the course of our research we considered the £;-forest problem: given an undirected 
graph is there a set of k nodes that induces an acyclic subgraph? The optimization version 
of this problem is the maximum acyclic subgraph problem. Since this problem is com- 
plementary to the minimum feedback vertex set problem [15], NP-completeness follows. 
While the feedback vertex set problem is 4-approximable [6], we can show that the maxi- 
mum acyclic subgraph problem is hard to approximate within a reasonable factor using an 
approximation-preserving transformation from the maximum independent set problem [5]. 
This same result has also been derived in a more general form in [24]. 

Theorem 7.1 There is a constant e > such that the maximum acyclic subgraph problem 
cannot be approximated within a factor fi(n e ) unless P = N P . 

Proof: Note that any acyclic subgraph of size S contains a maximum independent set of 
size at least S/2, since acyclic subgraphs are bipartite and each partition is an independent 
set. Further, every independent set is also an acyclic subgraph. These two facts show that 
the existence of a p- approximation algorithm for the maximum acyclic subgraph problem 
implies the existence of a 2p- approximation algorithm for the maximum independent set 
problem. But by the result in [5] we know that there is a constant e > such that the 
maximum independent set problem cannot be approximated within a factor fi(n e ) unless 
P = NP. Hence, the same is true of the maximum acyclic subgraph problem. □ 

Acknowledgements: The authors wish to thank Alex Zelikovsky and Naveen Garg for 
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pointing out that our algorithm for points in the plane extends to the rectilinear case too. 
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